{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Reinforcement Learning Glossary \n",
    "\n",
    "We have learned several important and fundamental concepts of Reinforcement learning. In this section, we will revisit the several important terms and terminologies that are very useful for understanding upcoming chapters. \n",
    "\n",
    "__Agent__ - Agent is the software program that learns to make intelligent decisions. Example: A software program that plays the chess game intelligently. \n",
    "\n",
    "__Environment__ - The environment is the world of the agent. A chessboard is called the environment when the agent plays the chess game.\n",
    "\n",
    "__State__ -  A state is a position or a moment in the environment where the agent can be in. Example: All the positions in the chessboard are called the states. \n",
    "\n",
    "__Action__ - The agent interacts with the environment by performing an action and move from one state to another. Example: Moves made by the chessman can be considered an action. \n",
    "\n",
    "__Reward__ -  A reward is a numerical value that the agent receives based on its action. Consider reward as a point. For instance, an agent receives +1 point (reward) for good action and -1 point (reward) for a bad action. \n",
    "\n",
    "__Action space__ - A set of all possible actions in the environment is called action space. The action space is called a discrete action space when our action space consists of discrete actions and the action space is called continuous action space when our actions space consists of continuous actions.\n",
    "\n",
    "__Policy__  - The agent makes a decision based on the policy. A policy tells the agent what action to perform in each state. It can be considered as the brain of an agent. A policy is called deterministic policy if it exactly maps a state to a particular action. Unlike deterministic policy, stochastic policy maps the state to a probability distribution over the action space. An optimal policy is the one that gives the maximum reward. \n",
    "\n",
    "__Episode__ - Agent environment interaction starting from initial state to terminal state is called the episode. An episode is often called the trajectory or rollout. \n",
    "\n",
    "__Episodic and continuous task__ - A reinforcement learning task is called episodic task if it has the terminal state and it is called a continuous task if it does not has a terminal state. \n",
    "\n",
    "__Horizon__ - Horizon can be considered as an agent's life span, that is, time step until which the agent interacts with the environment. Horizon is called finite horizon if the agent environment interaction stops at a particular time step and it is called an infinite horizon when the agent environment interaction continues forever. \n",
    "\n",
    "__Return__ - Return is the sum of rewards received by the agent in an episode.\n",
    "\n",
    "__Discount factor__ - Discount factor helps to control whether we want to give importance to the immediate reward or future reward. The value of the discount factor ranges from 0 to 1. A discount factor close to 0 implies that we give more importance to immediate reward while a discount factor close to 1 implies that we give more importance to future reward than the immediate reward.\n",
    "\n",
    "__Value function__ - Value function or the value of the state is the expected return that the agent would get starting from the state $s$ following the policy $\\pi#. \n",
    "\n",
    "__Q function__ - Q function or the value of a state-action pair implies the expected return agent would obtain starting from the state $s$ and an action $a$ following the policy $\\pi$. \n",
    "\n",
    "__Model-based and Model-free learning__ - When the agent tries to learn the optimal policy with the model dynamics then it is called model-based learning and when the agent tries to learn the optimal policy without the model dynamics then it is called model-free learning. \n",
    "\n",
    "__Deterministic and Stochastic environment__ - When an agent performs an action $a$ in the state $s$ and if it reaches the state  exactly every time, then the environment is called a deterministic environment. When an agent performs an action $a$ in the state $s$ and if it reaches different states every time based on some probability distribution then the environment is called stochastic environment. \n",
    "\n",
    "Thus, in this chapter, we have learned several important and fundamental concepts of reinforcement learning. In the next chapter, we will begin our Hands-on reinforcement learning journey by implementing all the fundamental concepts we have learned in this chapter using the popular toolkit called the gym. "
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
